Skip to content

nvmeof: Add external client support #6251

Open
gadididi wants to merge 2 commits into
ceph:develfrom
gadididi:nvmeof/external_clients
Open

nvmeof: Add external client support #6251
gadididi wants to merge 2 commits into
ceph:develfrom
gadididi:nvmeof/external_clients

Conversation

@gadididi
Copy link
Copy Markdown
Contributor

@gadididi gadididi commented Apr 26, 2026

@nixpanic

Add external client support

Support allowHostNQNs in VAC for external (non-K8s) clients. Hosts can be managed via mutable parameters.
Add host reconciliation logic (add/remove) for NVMe-oF subsystems.
auto-generate subsystem NQN if not provided (for external Client- they have dedicated StorageClass, without SubsystemNQN label) .

In this design, each PVC gets its own NVMe-oF subsystem. That means:

  • One subsystem per namespace (PVC): When a new PVC is created, a unique NVMe-oF subsystem is also created for it (if no provided SubsystemNQN in the SC) .
  • Isolation: Each volume is isolated at the NVMe-oF layer - no sharing of subsystems between volumes by default - prevent external hosts access storage to each other, we dont want to leave the responsibly to the client!
  • Access control: Only hosts explicitly added to that subsystem (via allowHostNQNs in VAC) can access the volume.

issue: #6240

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@mergify mergify Bot added the component/nvme-of Issues and PRs related to NVMe-oF. label Apr 26, 2026
@gadididi gadididi force-pushed the nvmeof/external_clients branch 2 times, most recently from ba8d811 to 9e8560e Compare April 27, 2026 08:01
@gadididi gadididi changed the title nvmeof: external clients [WIP] nvmeof: Add external client support Apr 27, 2026
@gadididi gadididi requested a review from nixpanic April 27, 2026 09:15
@gadididi gadididi marked this pull request as ready for review April 27, 2026 09:20
@gadididi gadididi force-pushed the nvmeof/external_clients branch from 9e8560e to b22de20 Compare April 27, 2026 10:14
@gadididi gadididi self-assigned this Apr 27, 2026
func parseHostsParameters(params map[string]string) ([]string, error) {
allowHostNQNs, exists := params[AllowHostNQNs]
if !exists || allowHostNQNs == "" {
return nil, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case allowHostNQNs == "", no hosts should be allowed anymore? Probably it makes sense to return an empty []string instead of nil.

Note that this is different from not having the AllowHostNQNs parameter at all. If the parameter is not given, the allowed hosts should not be modified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err := parseParam(nvmeof.RMbytesPerSecond, nvmeof.RMbytesPerSecond, &qos.RMbytesPerSecond); err != nil {
return nil, err
// parseHostsParameters parses the hosts yaml list parameter and validates its contents.
// It returns a slice of hostnames or an error if the YAML is invalid.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostnames? HostNQN, I think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if !hasAnyQoS {
if len(allowHostsList) == 0 {
return nil, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, if all hosts are removed from the list, no host should still have access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread internal/nvmeof/nvmeof.go Outdated
}

// Convert to sets for easier comparison
currentHostSet := make(map[string]bool)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a little complicated... why not

  1. desiredHosts as function parameter, copy it to hostsToAdd
  2. loop though currentHosts and
    • if in desiredHosts already, remove it from hostsToAdd
    • if not in desiredHosts, add to to hostsToRemove
  3. remove all hostsToRemove
  4. add all hostsToAdd

you can use the slices package to do the checking with slices.Contains(desiredHosts, hostNQN), for example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread internal/nvmeof/volume.go
v.SubsystemNQN = parameters["subsystemNQN"]

if v.SubsystemNQN == "" {
v.SubsystemNQN = "nqn.2016-06.io.ceph:subsystem." + volumeID
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the volume ID can be quite long, is there a limit on the name of the subsystemNQN?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on our ceph cluster. and it can handle it.
I tried to test this entire feature on ODF, but I see some errors in ODF building . I will double check it there once the ODF Jenkins will be back to be stable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. According to the CSI specification, a volume-id can be up to 128 characters. If that is acceptable for the subsystemnqn, all is good. Otherwise it may be create a uuid or some kind of hash of the volume-id?

@gadididi gadididi force-pushed the nvmeof/external_clients branch from b22de20 to d9bc798 Compare April 29, 2026 08:25
@gadididi gadididi requested review from Rakshith-R and nixpanic April 29, 2026 10:44
@gadididi
Copy link
Copy Markdown
Contributor Author

gadididi commented Apr 30, 2026

This PR is not tested yet. There is an issue in ODF Jenkins. Once it will resolve I will test this PR.

UPDATE:
I tested this PR. all goes smoothly.

@gadididi gadididi force-pushed the nvmeof/external_clients branch from d9bc798 to 6f8aef8 Compare April 30, 2026 09:52
//
// allowHostNQNs: |
// - nqn.2014-08.org.nvmexpress:host1
// - nqn.2014-08.org.nvmexpress:host2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure this example is included in the StorageClass too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it 🤔
the PVC may have VAC (with allowHostNQNs) , but we dont add this param into the StorageClass

@gadididi gadididi requested a review from nixpanic May 10, 2026 06:47
@gadididi gadididi added the keepalive This label can be used to disable stale bot activiity in the repo label May 10, 2026
Extracted common gateway connection logic
into `withGatewayConnection()` helper to avoid code duplication
between QoS and host management operations (in the next commit).

The helper manages:
- Secret retrieval
- NVMe-oF metadata extraction from volume
- Gateway connection with automatic cleanup
- Operation execution via callback function
Refactored modifyNVMeoFQoS to use the new helper.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the NVMe-oF controller path to better support external (non-Kubernetes) clients by making subsystem host access mutable via VolumeAttributesClass parameters and by auto-generating a subsystem NQN when one is not provided.

Changes:

  • Auto-generate a default NVMe-oF subsystem NQN during CreateVolume when subsystemNQN is not provided.
  • Add gateway-side host listing and host reconciliation (add/remove) to match a desired allowHostNQNs list.
  • Extend ControllerModifyVolume() to handle both QoS updates and host allow-list updates, using a shared gateway-connection helper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
internal/nvmeof/volume.go Makes subsystemNQN optional by generating a default based on volumeID.
internal/nvmeof/volume_test.go Adds coverage for default subsystem NQN generation when missing.
internal/nvmeof/nvmeof.go Adds ListHosts() and UpdateHostsForSubsystem() reconciliation logic.
internal/nvmeof/controller/controllerserver.go Parses allowHostNQNs, wires host/QoS updates into ControllerModifyVolume, and passes volumeID for default NQN generation.
Comments suppressed due to low confidence (1)

internal/nvmeof/controller/controllerserver.go:1121

  • The comment for AllowHostNQNs documents that users can set the value to "" to allow any host, but parseHostsParameters only accepts a YAML list of strings and will fail to parse "". Either implement explicit support for the wildcard value (and define how it maps to gateway behavior), or update the documentation/validation to disallow it.
// AllowHostNQNs is the VolumeAttributesClass mutable parameter key for specifying
// a YAML list of host NQNs to allow access to a volume. Use "*" to allow any host.
// Example:
//
//	allowHostNQNs: |
//	  - nqn.2014-08.org.nvmexpress:host1
//	  - nqn.2014-08.org.nvmexpress:host2
const AllowHostNQNs = "allowHostNQNs"

Comment on lines +630 to +635
if len(hosts) == 0 {
log.DebugLog(ctx, "No hosts to add or remove for volume %s", volumeID)

return nil
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +549 to +566
// parseHostsParameters parses the hosts yaml list parameter and validates its contents.
// It returns a slice of hostNQNs or an error if the YAML is invalid.
// allowHostNQNs entry can be empty, in that case it means no hosts are allowed to access
// the subsystem (empty allow list).
func parseHostsParameters(params map[string]string) ([]string, error) {
allowHostNQNs, exists := params[AllowHostNQNs]
if !exists {
return nil, nil
}
if err := parseParam(nvmeof.RwMbytesPerSecond, nvmeof.RwMbytesPerSecond, &qos.RwMbytesPerSecond); err != nil {
return nil, err
var allowHostsList []string
if allowHostNQNs == "" {
return allowHostsList, nil
}
if err := parseParam(nvmeof.RMbytesPerSecond, nvmeof.RMbytesPerSecond, &qos.RMbytesPerSecond); err != nil {
return nil, err
}
if err := parseParam(nvmeof.WMbytesPerSecond, nvmeof.WMbytesPerSecond, &qos.WMbytesPerSecond); err != nil {
return nil, err
}

if !hasAnyQoS {
return nil, nil
if err := yaml.Unmarshal([]byte(allowHostNQNs), &allowHostsList); err != nil {
return nil, fmt.Errorf("invalid %s: must be a YAML list of strings: %w", AllowHostNQNs, err)
}

return qos, nil
return allowHostsList, nil
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

internal/nvmeof/controller/controllerserver.go:561

  • parseHostsParameters returns a nil slice when allowHostNQNs is present but empty. Downstream callers use hostsList != nil to decide whether to reconcile, so setting allowHostNQNs: "" cannot remove all currently-allowed hosts (it is treated like the parameter is absent). Return a non-nil empty slice (or return an explicit present boolean) when the key exists but is empty so ControllerModifyVolume can reconcile to an empty allow-list.
func parseHostsParameters(params map[string]string) ([]string, error) {
	allowHostNQNs, exists := params[AllowHostNQNs]
	if !exists {
		return nil, nil
	}
	var allowHostsList []string
	if allowHostNQNs == "" {
		return allowHostsList, nil
	}

internal/nvmeof/controller/mutable_params_test.go:175

  • The test case for empty allowHostNQNs currently expects a nil slice, which matches the current parsing but also hides the need to distinguish “parameter present but empty list” from “parameter absent”. Update this test to expect a non-nil empty slice (or otherwise assert the presence semantics) so regressions in host reconciliation (clearing all hosts) are caught.

Comment thread internal/nvmeof/controller/controllerserver.go
Comment thread internal/nvmeof/nvmeof.go
Implements dynamic host access control for NVMe-oF volumes to support
external(non-Kubernetes) clients through VolumeAttributesClass.

Changes:
- Add `parseHostsParameters()` to parse YAML host list from VAC
- Implement `modifyNVMeoFHosts()` for runtime host updates
- Add `ListHosts()` and `UpdateHostsForSubsystem()` for host updateing
- Support allowHostNQNs parameter in CreateVolume and ControllerModifyVolume
- Auto-generate subsystem NQN from volumeID if not provided

Users can now specify external hosts in VAC mutable parameters:
  allowHostNQNs: |
    - nqn.2014-08.org.nvmexpress:host1
    - nqn.2014-08.org.nvmexpress:host2

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
@gadididi gadididi force-pushed the nvmeof/external_clients branch 2 times, most recently from 852fd6c to b16bc7d Compare May 13, 2026 13:56
@gadididi gadididi requested a review from a team May 14, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/nvme-of Issues and PRs related to NVMe-oF. keepalive This label can be used to disable stale bot activiity in the repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants